Skip to content

CKS: Allow affinity group selection during cluster creation#12386

Open
Damans227 wants to merge 41 commits intoapache:mainfrom
Damans227:implement-cks-node-affinity
Open

CKS: Allow affinity group selection during cluster creation#12386
Damans227 wants to merge 41 commits intoapache:mainfrom
Damans227:implement-cks-node-affinity

Conversation

@Damans227
Copy link
Contributor

@Damans227 Damans227 commented Jan 7, 2026

Description

This PR adds support for specifying affinity groups during CKS (CloudStack Kubernetes Service) cluster creation, allowing users to control VM placement for high availability.

Changes:

  • New nodeaffinitygroups parameter for createKubernetesCluster API
  • Supports per-node-type (CONTROL, WORKER, ETCD) affinity group assignment
  • New kubernetes_cluster_affinity_group_map table for normalized storage

Design doc:

https://cwiki.apache.org/confluence/display/CLOUDSTACK/Allow+users+to+select+affinity+group+during+managed+CKS+cluster+creation

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Enhancement (improves an existing feature and functionality)
  • Cleanup (Code refactoring and cleanup, that may add test cases)
  • Build/CI
  • Test (unit or integration test code)

Feature/Enhancement Scale or Bug Severity

Feature/Enhancement Scale

  • Major
  • Minor

Bug Severity

  • BLOCKER
  • Critical
  • Major
  • Minor
  • Trivial

cmk based api testing:

Screen recording

Screencast.from.2026-01-13.06-47-14.mp4

Screenshots

Screenshot from 2026-01-13 06-46-12 Screenshot from 2026-01-13 06-46-24 Screenshot from 2026-01-13 06-46-42

How Has This Been Tested?

How did you try to break this feature and the system with this change?

  • Invalid affinity group UUID
  • Invalid node type
  • Duplicate node type entries
  • Clusters without affinity groups (backward compatibility)

Daman Arora added 21 commits January 6, 2026 08:51
…ndling and enhance node type validation tests
@blueorangutan
Copy link

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ el10 ✔️ debian ✔️ suse15. SL-JID 16294

@codecov
Copy link

codecov bot commented Jan 8, 2026

Codecov Report

❌ Patch coverage is 66.33333% with 101 lines in your changes missing coverage. Please review.
✅ Project coverage is 17.95%. Comparing base (c465caf) to head (d3b4b2c).
⚠️ Report is 221 commits behind head on main.

Files with missing lines Patch % Lines
...bernetes/cluster/KubernetesClusterManagerImpl.java 61.53% 51 Missing and 4 partials ⚠️
.../dao/KubernetesClusterAffinityGroupMapDaoImpl.java 0.00% 25 Missing ⚠️
...er/actionworkers/KubernetesClusterStartWorker.java 0.00% 5 Missing ⚠️
...kubernetes/cluster/CreateKubernetesClusterCmd.java 0.00% 5 Missing ⚠️
...s/cluster/KubernetesClusterAffinityGroupMapVO.java 86.20% 3 Missing and 1 partial ⚠️
...KubernetesClusterResourceModifierActionWorker.java 0.00% 3 Missing ⚠️
...ubernetes/cluster/KubernetesServiceHelperImpl.java 96.55% 1 Missing and 1 partial ⚠️
.../actionworkers/KubernetesClusterDestroyWorker.java 0.00% 1 Missing ⚠️
.../kubernetes/cluster/ScaleKubernetesClusterCmd.java 0.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main   #12386      +/-   ##
============================================
+ Coverage     17.76%   17.95%   +0.18%     
- Complexity    15859    16220     +361     
============================================
  Files          5923     5942      +19     
  Lines        530470   533525    +3055     
  Branches      64823    65281     +458     
============================================
+ Hits          94243    95796    +1553     
- Misses       425682   426982    +1300     
- Partials      10545    10747     +202     
Flag Coverage Δ
uitests 3.66% <ø> (+0.09%) ⬆️
unittests 19.07% <66.33%> (+0.21%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@DaanHoogland DaanHoogland added this to the 4.23.0 milestone Jan 8, 2026
@nvazquez
Copy link
Contributor

@blueorangutan package

@blueorangutan
Copy link

@nvazquez a [SL] Jenkins job has been kicked to build packages. It will be bundled with no SystemVM templates. I'll keep you posted as I make progress.

@Damans227 Damans227 marked this pull request as ready for review February 11, 2026 15:26
@blueorangutan
Copy link

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ el10 ✔️ debian ✔️ suse15. SL-JID 16800

@nvazquez
Copy link
Contributor

@blueorangutan test

@blueorangutan
Copy link

@nvazquez a [SL] Trillian-Jenkins test job (ol8 mgmt + kvm-ol8) has been kicked to run smoke tests

Copy link
Contributor

@DaanHoogland DaanHoogland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clgtm (afaics) , will need some 3rd person testing, but looks really promissing.

@blueorangutan
Copy link

[SF] Trillian test result (tid-15442)
Environment: kvm-ol8 (x2), zone: Advanced Networking with Mgmt server ol8
Total time taken: 53380 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr12386-t15442-kvm-ol8.zip
Smoke tests completed. 146 look OK, 4 have errors, 0 did not run
Only failed and skipped tests results shown below:

Test Result Time (s) Test File
ContextSuite context=TestListIdsParams>:teardown Error 1.13 test_list_ids_parameter.py
test_01_snapshot_root_disk Error 6.03 test_snapshots.py
test_02_list_snapshots_with_removed_data_store Error 48.84 test_snapshots.py
test_02_list_snapshots_with_removed_data_store Error 48.84 test_snapshots.py
ContextSuite context=TestSnapshotStandaloneBackup>:teardown Error 30.77 test_snapshots.py
test_01_snapshot_usage Error 24.81 test_usage.py
test_01_vpn_usage Error 1.09 test_usage.py
test_01_redundant_vpc_site2site_vpn Failure 378.74 test_vpc_vpn.py

@Override
public List<Long> listAffinityGroupIdsByClusterIdAndNodeType(long clusterId, String nodeType) {
List<KubernetesClusterAffinityGroupMapVO> maps = listByClusterIdAndNodeType(clusterId, nodeType);
return maps.stream().map(KubernetesClusterAffinityGroupMapVO::getAffinityGroupId).collect(Collectors.toList());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a null check should be needed at this point, to cover cases in which the affinity group is not passed for the node type

}

protected List<Long> getAffinityGroupIdsForNodeType(KubernetesClusterNodeType nodeType) {
return new ArrayList<>(kubernetesClusterAffinityGroupMapDao.listAffinityGroupIdsByClusterIdAndNodeType(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here as well, can we add checks for null or empty list?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: can use CollectionUtils.isEmpty(affinityGroupIds) here

Long nodeHostId = node.getHostId();
String nodeHostName = getHostName(nodeHostId);

if ("host anti-affinity".equalsIgnoreCase(affinityGroupType)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about adding the following checks to not rely on a string comparisson?

  • Inject AffinityGroupService on the class level
  • Add the following method on AffinityGroupService interface: protected Map<String, AffinityGroupProcessor> getAffinityTypeToProcessorMap()
  • Mark this method as public on AffinityGroupServiceImpl (definition already exists)
  • Obtain the AffinityGroupProcessor for the affinity group type (adding null checks) and replace the string comparisson to processor instanceof HostAntiAffinityProcessor and processor isinstanceof HostAffinityProcessor

What do you think?

}

protected void validateNewNodesAntiAffinity(List<Long> nodeIds, AffinityGroupVO affinityGroup, KubernetesCluster cluster) {
if (!"host anti-affinity".equalsIgnoreCase(affinityGroup.getType())) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

@blueorangutan
Copy link

@Damans227 a [SL] Jenkins job has been kicked to build packages. It will be bundled with no SystemVM templates. I'll keep you posted as I make progress.

1 similar comment
@blueorangutan
Copy link

@Damans227 a [SL] Jenkins job has been kicked to build packages. It will be bundled with no SystemVM templates. I'll keep you posted as I make progress.

@Damans227
Copy link
Contributor Author

@blueorangutan package

@blueorangutan
Copy link

@Damans227 a [SL] Jenkins job has been kicked to build packages. It will be bundled with no SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ el10 ✔️ debian ✔️ suse15. SL-JID 16863

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds comprehensive support for affinity group selection during CKS (CloudStack Kubernetes Service) cluster creation. The feature allows users to specify different affinity groups for each node type (CONTROL, WORKER, ETCD), enabling fine-grained control over VM placement for high availability and performance optimization.

Changes:

  • New database table kubernetes_cluster_affinity_group_map for storing per-node-type affinity group associations
  • New API parameter nodeaffinitygroups in the createKubernetesCluster API supporting multiple affinity groups per node type
  • UI enhancements to allow affinity group selection for control, worker, and ETCD nodes in advanced mode
  • Backend logic to merge user-selected affinity groups with ExplicitDedication groups during VM provisioning
  • Comprehensive integration and unit tests covering various affinity group scenarios

Reviewed changes

Copilot reviewed 26 out of 26 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
engine/schema/src/main/resources/META-INF/db/schema-42210to42300.sql Creates new table for affinity group mappings with proper foreign key constraints
api/src/main/java/org/apache/cloudstack/api/ApiConstants.java Adds API constants for affinity group IDs and names response fields
api/src/main/java/com/cloud/vm/VmDetailConstants.java Adds AFFINITY_GROUP constant for parameter mapping
api/src/main/java/com/cloud/kubernetes/cluster/KubernetesServiceHelper.java Extends interface with getAffinityGroupNodeTypeMap method
plugins/.../KubernetesClusterAffinityGroupMapVO.java New entity class for affinity group mappings
plugins/.../dao/KubernetesClusterAffinityGroupMapDao.java DAO interface for affinity group mapping operations
plugins/.../dao/KubernetesClusterAffinityGroupMapDaoImpl.java DAO implementation with search builders
plugins/.../KubernetesServiceHelperImpl.java Implements affinity group validation and ID resolution logic
plugins/.../KubernetesClusterManagerImpl.java Adds affinity group persistence, response building, and node validation logic
plugins/.../actionworkers/KubernetesClusterActionWorker.java Adds methods to retrieve and merge affinity groups for node types
plugins/.../actionworkers/KubernetesClusterStartWorker.java Updates VM creation to use merged affinity group lists
plugins/.../actionworkers/KubernetesClusterResourceModifierActionWorker.java Updates worker node creation with affinity groups
plugins/.../actionworkers/KubernetesClusterDestroyWorker.java Adds cleanup for affinity group mappings on cluster deletion
plugins/.../CreateKubernetesClusterCmd.java Adds affinityGroupNodeTypeMap parameter and helper method
plugins/.../ScaleKubernetesClusterCmd.java Renames variable from kubernetesClusterHelper to kubernetesServiceHelper
plugins/.../KubernetesClusterResponse.java Adds response fields for affinity group IDs and names
plugins/.../spring-kubernetes-service-context.xml Registers new DAO bean
ui/src/views/compute/CreateKubernetesCluster.vue Adds UI controls for affinity group selection with account validation
ui/src/config/section/compute.js Adds affinity group fields to cluster details view
ui/public/locales/en.json Adds localization strings for affinity group labels
test/integration/component/test_kubernetes_cluster_affinity_groups.py Comprehensive integration tests for affinity group functionality
plugins/.../test/.../KubernetesClusterActionWorkerTest.java Unit tests for affinity group retrieval and merging
plugins/.../test/.../KubernetesServiceHelperImplTest.java Unit tests for affinity group validation and mapping
plugins/.../test/.../KubernetesClusterManagerImplTest.java Unit tests for affinity group response building and validation
plugins/.../test/.../KubernetesClusterAffinityGroupMapVOTest.java Unit tests for the VO class

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing localization string for error message. The code references 'message.error.affinity.groups.different.accounts' but this key is not defined in en.json. Add the following entry to ui/public/locales/en.json:

"message.error.affinity.groups.different.accounts": "Affinity groups from different accounts cannot be used together"

Suggested change
message: this.$t('message.error.affinity.groups.different.accounts')
message: 'Affinity groups from different accounts cannot be used together'

Copilot uses AI. Check for mistakes.
Comment on lines +399 to +452
self.assertEqual(
cluster.controlnodeaffinitygroupid,
control_aff_grp.id,
"Control node affinity group ID mismatch. Expected: {}, Got: {}".format(
control_aff_grp.id, cluster.controlnodeaffinitygroupid)
)
self.assertEqual(
cluster.controlnodeaffinitygroupname,
control_aff_grp.name,
"Control node affinity group name mismatch. Expected: {}, Got: {}".format(
control_aff_grp.name, cluster.controlnodeaffinitygroupname)
)
else:
self.assertTrue(
not hasattr(cluster, 'controlnodeaffinitygroupid') or cluster.controlnodeaffinitygroupid is None,
"Control node affinity group should be None"
)

if worker_aff_grp is not None:
self.assertEqual(
cluster.workernodeaffinitygroupid,
worker_aff_grp.id,
"Worker node affinity group ID mismatch. Expected: {}, Got: {}".format(
worker_aff_grp.id, cluster.workernodeaffinitygroupid)
)
self.assertEqual(
cluster.workernodeaffinitygroupname,
worker_aff_grp.name,
"Worker node affinity group name mismatch. Expected: {}, Got: {}".format(
worker_aff_grp.name, cluster.workernodeaffinitygroupname)
)
else:
self.assertTrue(
not hasattr(cluster, 'workernodeaffinitygroupid') or cluster.workernodeaffinitygroupid is None,
"Worker node affinity group should be None"
)

if etcd_aff_grp is not None:
self.assertEqual(
cluster.etcdnodeaffinitygroupid,
etcd_aff_grp.id,
"ETCD node affinity group ID mismatch. Expected: {}, Got: {}".format(
etcd_aff_grp.id, cluster.etcdnodeaffinitygroupid)
)
self.assertEqual(
cluster.etcdnodeaffinitygroupname,
etcd_aff_grp.name,
"ETCD node affinity group name mismatch. Expected: {}, Got: {}".format(
etcd_aff_grp.name, cluster.etcdnodeaffinitygroupname)
)
else:
self.assertTrue(
not hasattr(cluster, 'etcdnodeaffinitygroupid') or cluster.etcdnodeaffinitygroupid is None,
"ETCD node affinity group should be None"
Copy link

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test field name mismatch: The test is checking for singular fields like 'controlnodeaffinitygroupid' and 'controlnodeaffinitygroupname', but the API response only defines plural fields 'controlaffinitygroupids' and 'controlaffinitygroupnames' (which return comma-separated values). The test assertions at lines 400, 406, 420, 426, 438, and 444 will fail because these singular attributes don't exist in the KubernetesClusterResponse class.

Update the test to use the correct plural field names and handle CSV parsing. For example:

  • cluster.controlaffinitygroupids instead of cluster.controlnodeaffinitygroupid
  • Parse the CSV string and verify it contains the expected affinity group ID
Suggested change
self.assertEqual(
cluster.controlnodeaffinitygroupid,
control_aff_grp.id,
"Control node affinity group ID mismatch. Expected: {}, Got: {}".format(
control_aff_grp.id, cluster.controlnodeaffinitygroupid)
)
self.assertEqual(
cluster.controlnodeaffinitygroupname,
control_aff_grp.name,
"Control node affinity group name mismatch. Expected: {}, Got: {}".format(
control_aff_grp.name, cluster.controlnodeaffinitygroupname)
)
else:
self.assertTrue(
not hasattr(cluster, 'controlnodeaffinitygroupid') or cluster.controlnodeaffinitygroupid is None,
"Control node affinity group should be None"
)
if worker_aff_grp is not None:
self.assertEqual(
cluster.workernodeaffinitygroupid,
worker_aff_grp.id,
"Worker node affinity group ID mismatch. Expected: {}, Got: {}".format(
worker_aff_grp.id, cluster.workernodeaffinitygroupid)
)
self.assertEqual(
cluster.workernodeaffinitygroupname,
worker_aff_grp.name,
"Worker node affinity group name mismatch. Expected: {}, Got: {}".format(
worker_aff_grp.name, cluster.workernodeaffinitygroupname)
)
else:
self.assertTrue(
not hasattr(cluster, 'workernodeaffinitygroupid') or cluster.workernodeaffinitygroupid is None,
"Worker node affinity group should be None"
)
if etcd_aff_grp is not None:
self.assertEqual(
cluster.etcdnodeaffinitygroupid,
etcd_aff_grp.id,
"ETCD node affinity group ID mismatch. Expected: {}, Got: {}".format(
etcd_aff_grp.id, cluster.etcdnodeaffinitygroupid)
)
self.assertEqual(
cluster.etcdnodeaffinitygroupname,
etcd_aff_grp.name,
"ETCD node affinity group name mismatch. Expected: {}, Got: {}".format(
etcd_aff_grp.name, cluster.etcdnodeaffinitygroupname)
)
else:
self.assertTrue(
not hasattr(cluster, 'etcdnodeaffinitygroupid') or cluster.etcdnodeaffinitygroupid is None,
"ETCD node affinity group should be None"
control_ids_csv = getattr(cluster, 'controlaffinitygroupids', None)
control_names_csv = getattr(cluster, 'controlaffinitygroupnames', None)
self.assertIsNotNone(
control_ids_csv,
"Control affinity group IDs should be present when a control affinity group is specified"
)
self.assertIsNotNone(
control_names_csv,
"Control affinity group names should be present when a control affinity group is specified"
)
control_ids = [v.strip() for v in str(control_ids_csv).split(',') if v.strip()]
control_names = [v.strip() for v in str(control_names_csv).split(',') if v.strip()]
self.assertIn(
str(control_aff_grp.id),
control_ids,
"Control node affinity group ID mismatch. Expected to find ID {} in {}".format(
control_aff_grp.id, control_ids_csv)
)
self.assertIn(
control_aff_grp.name,
control_names,
"Control node affinity group name mismatch. Expected to find name '{}' in '{}'".format(
control_aff_grp.name, control_names_csv)
)
else:
self.assertTrue(
not hasattr(cluster, 'controlaffinitygroupids') or not getattr(cluster, 'controlaffinitygroupids'),
"Control node affinity group should be None or empty"
)
if worker_aff_grp is not None:
worker_ids_csv = getattr(cluster, 'workeraffinitygroupids', None)
worker_names_csv = getattr(cluster, 'workeraffinitygroupnames', None)
self.assertIsNotNone(
worker_ids_csv,
"Worker affinity group IDs should be present when a worker affinity group is specified"
)
self.assertIsNotNone(
worker_names_csv,
"Worker affinity group names should be present when a worker affinity group is specified"
)
worker_ids = [v.strip() for v in str(worker_ids_csv).split(',') if v.strip()]
worker_names = [v.strip() for v in str(worker_names_csv).split(',') if v.strip()]
self.assertIn(
str(worker_aff_grp.id),
worker_ids,
"Worker node affinity group ID mismatch. Expected to find ID {} in {}".format(
worker_aff_grp.id, worker_ids_csv)
)
self.assertIn(
worker_aff_grp.name,
worker_names,
"Worker node affinity group name mismatch. Expected to find name '{}' in '{}'".format(
worker_aff_grp.name, worker_names_csv)
)
else:
self.assertTrue(
not hasattr(cluster, 'workeraffinitygroupids') or not getattr(cluster, 'workeraffinitygroupids'),
"Worker node affinity group should be None or empty"
)
if etcd_aff_grp is not None:
etcd_ids_csv = getattr(cluster, 'etcdaffinitygroupids', None)
etcd_names_csv = getattr(cluster, 'etcdaffinitygroupnames', None)
self.assertIsNotNone(
etcd_ids_csv,
"ETCD affinity group IDs should be present when an ETCD affinity group is specified"
)
self.assertIsNotNone(
etcd_names_csv,
"ETCD affinity group names should be present when an ETCD affinity group is specified"
)
etcd_ids = [v.strip() for v in str(etcd_ids_csv).split(',') if v.strip()]
etcd_names = [v.strip() for v in str(etcd_names_csv).split(',') if v.strip()]
self.assertIn(
str(etcd_aff_grp.id),
etcd_ids,
"ETCD node affinity group ID mismatch. Expected to find ID {} in {}".format(
etcd_aff_grp.id, etcd_ids_csv)
)
self.assertIn(
etcd_aff_grp.name,
etcd_names,
"ETCD node affinity group name mismatch. Expected to find name '{}' in '{}'".format(
etcd_aff_grp.name, etcd_names_csv)
)
else:
self.assertTrue(
not hasattr(cluster, 'etcdaffinitygroupids') or not getattr(cluster, 'etcdaffinitygroupids'),
"ETCD node affinity group should be None or empty"

Copilot uses AI. Check for mistakes.
@Damans227 Damans227 force-pushed the implement-cks-node-affinity branch from c0748bb to f06461c Compare February 18, 2026 16:13
@weizhouapache weizhouapache linked an issue Feb 19, 2026 that may be closed by this pull request
@Damans227
Copy link
Contributor Author

@blueorangutan package

@blueorangutan
Copy link

@Damans227 a [SL] Jenkins job has been kicked to build packages. It will be bundled with no SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ el10 ✔️ debian ✔️ suse15. SL-JID 16885

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Kubernetes nodes with affinity group

8 participants

Comments